Replication archive for "Child Welfare System Contact and Voting" for upload to JOP dataverse
Ariel White, Marie-Pascale Grimon, Rebecca Goldstein, Kelley Fong
Spring 2025
Email Ariel (arwhi@mit.edu) or Marie-Pascale (marie-pascale.grimon@sofi.su.se) with questions

This replication package contains the code used to produce all tables and figures in the paper and SI. However, due to the sensitivity of the raw data (individual records of contact with the child welfare system, acquired via a data-use agreement with a government agency), we are not able to provide the raw data used for these analyses. We provide the code here so that researchers can see how we processed and analyzed the data, and so that researchers who apply for access to the same datasets can then use our code to replicate and extend our analyses. 

The replication archive includes several files:

"0. Masterfile" is a do-file that runs all the stata do-file codes sequentially. 

"1. Merging CPS data sources.do" merges the different CPS data sources needed for the analyses. In particular, it locates the investigator for each referral and pulls in information on whether any children from the household were removed from the home. 

"2. Creating ID variable and addressing duplicates.do" creates a Masterfile for ID duplicates from the child protective services data. client_id is generated for all individuals reported to CPS whereas mci_id's are generated only when sufficient information is available about the person (typically name, gender and date of birth). mci_id is a variable that records duplicates based of client_id's and mci_uniq_id is the ID matched into these IDs from the data warehouse. To address duplicates, any observations that ever have the same mci_id or same mci_uniq_id are matched. A subset of suspicious looking merged duplicates were hand-checked using names and date of birth to determine whether it was indeed reasonable to consider them as duplicates. The new ID generated from this process is mci_uniq_id9 which we use in our later analyses. 

"3. Merging duplicates in outcome data.do"  merges outcomes for duplicate observations (voted if any duplicate ID voted, etc.) 

"4. Creating variables in CPS data.do" creates variables for the analysis. In particular, this is where the tendency to open a case and remove children are generated. 

"5. Merging CPS and Outcomes data.do" creates the sample restrictions and generates the Appendix Table A.1 which provides detail on the sample restrictions. 

"6. Paper Tables and Figures.do" generates all the tables and figures of the paper, with the exception of a few bar plots generated with the following R-file using excel sheets outputted from this stata do-file. 

"plottinginR.R" pulls in estimates calculated in "6. Paper Tables and Figures.do" and produces several bar plots for the main paper and SI (Figures 1, 2, A1).

In the "Data cleaning" folder, we include two excel files that are necessary for the data cleaning procedures conducted in do-files #1. "Allegation Groupings.xlsx" codes up the acronyms into allegation labels. "Mandated Reporter Groupings.xlsx" groups the open text descriptions of reporters into categories and based on those labels which ones are mandated reporters. 

